Multi-Stage Programming for GPUs in Modern C++ using PACXX

نویسندگان

Michael Haidl

Michel Steuwer

Tim Humernbrum

Sergei Gorlatch

چکیده

Writing and optimizing programs for high performance on systems with GPUs remains a challenging task even for expert programmers. One promising optimization technique is to evaluate parts of the program upfront on the CPU and embed the computed results in the GPU code allowing for more aggressive compiler optimizations. This technique is known as multi-stage programming and has proven to allow for significant performance benefits. Unfortunately, to achieve such optimizations in current GPU programming models like OpenCL, programmers are forced to manipulate the GPU source code as plain strings, which is error-prone and type-unsafe. In this paper we describe PACXX a GPU programming approach using modern C++ standards, with the convenient features like type deduction, lambda expressions, and algorithms from the standard template library (STL). Using PACXX, a GPU program is written as a single C++ program, rather than two distinct host and kernel programs. We extend PACXX with an easy-to-use and typesafe API for multi-stage programming avoiding the pitfalls of string manipulation. Using just-in-time compilation techniques, PACXX generates efficient GPU code at runtime. Our evaluation shows that using PACXX allows for writing multi-stage code easier and safer than currently possible. Using two detailed application studies we show that multi-stage programming can significantly outperform equivalent non-staged programs. Furthermore, we show that PACXX generates code with performance comparable to industrial-strength OpenCL compilers.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Programming GPUs with C++14 and Just-In-Time Compilation

Systems that comprise accelerators (e.g., GPUs) promise high performance, but their programming is still a challenge, mainly because of two reasons: 1) two distinct programming models have to be used within an application: one for the host CPU (e.g., C++), and one for the accelerator (e.g., OpenCL or CUDA); 2) using Just-In-Time (JIT) compilation and its optimization opportunities in both OpenC...

متن کامل

Accelerating high-order WENO schemes using two heterogeneous GPUs

A double-GPU code is developed to accelerate WENO schemes. The test problem is a compressible viscous flow. The convective terms are discretized using third- to ninth-order WENO schemes and the viscous terms are discretized by the standard fourth-order central scheme. The code written in CUDA programming language is developed by modifying a single-GPU code. The OpenMP library is used for parall...

متن کامل

A multi-stage stochastic programming for condition-based maintenance with proportional hazards model

Condition-Based Maintenance (CBM) optimization using Proportional Hazards Model (PHM) is a kind of maintenance optimization problem in which inspections of a system relevant to its failure rate depending on the age and value of covariates are performed in time intervals. The general approach for constructing a CBM based on PHM for a system is to minimize a long run average cost per unit of time...

متن کامل

Development of a New Methodology for Improving Urban Fast Response Lagrangian Dispersion Simulation via Parallelism on the Graphics Processing Unit

INTRODUCTION Recent trends in computing have shifted toward multi-core processors and programmable graphics processors with highly parallel data paths for processing geometry and pixels. Multi-core machines are now readily available with 2 cores, but machines with 4, 8, and even 16 cores are projected for the near future. Data parallelism in modern graphics cards is also increasing with raw per...

متن کامل

A hybrid solution approach for a multi-objective closed-loop logistics network under uncertainty

The design of closed-loop logistics (forward and reverse logistics) has attracted growing attention with the stringent pressures of customer expectations, environmental concerns and economic factors. This paper considers a multi-product, multi-period and multi-objective closed-loop logistics network model with regard to facility expansion as a facility location–allocation problem, which more cl...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2016

Multi-Stage Programming for GPUs in Modern C++ using PACXX

نویسندگان

چکیده

منابع مشابه

Programming GPUs with C++14 and Just-In-Time Compilation

Accelerating high-order WENO schemes using two heterogeneous GPUs

A multi-stage stochastic programming for condition-based maintenance with proportional hazards model

Development of a New Methodology for Improving Urban Fast Response Lagrangian Dispersion Simulation via Parallelism on the Graphics Processing Unit

A hybrid solution approach for a multi-objective closed-loop logistics network under uncertainty

عنوان ژورنال:

اشتراک گذاری